Annotation of Error Types for German Newsgroup Corpus
نویسندگان
چکیده
This paper discusses the corpus annotation effort in the FLAG project and its application in the development of controlled language and grammar checking applications. A USENET corpus was collected and annotated using the error typology developed in the project. The DiET tool was used to support the automatic annotation effort, and to evaluate and validate the data. Finally, we report on some interesting aspects of the data which came out of our evaluation.
منابع مشابه
Tense, Modality and Polarity: The Finite Verbal Group in English and German Newsgroup Texts
This paper describes work in progress on a corpus-based study, comparing seemingly similar registers in two languages: English and German newsgroup texts, collected in the Bremen Translation Corpus. Systemic Functional Grammar (SFG, Halliday 1994 [1985]) provides a theoretical framework for categorizing empirical findings. I will focus on three systems of the finite verbal group, i.e. tense, mo...
متن کاملEAGLE: an Error-Annotated Corpus of Beginning Learner German
This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of German that focuses on linguistic propertie...
متن کاملAnnotating Discourse Anaphora
In this paper, we present preliminary work on corpus-based anaphora resolution of discourse deixis in German. Our annotation guidelines provide linguistic tests for locating the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus.
متن کاملTowards Detecting Annotation Errors in Spoken Language Corpora
The issue Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in part-of-speech and other positional annotation (van Halteren, 2000; Eskin, 2000; Dickinson and Meurers, 2003a), only recently has there been some work in detecting errors in synt...
متن کاملCombining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data
We present a VerbNet-based annotation scheme for semantic roles which we explore in an annotation study on German language data that combines word sense and semantic role annotation. We reannotate a substantial portion of the SALSA corpus with GermaNet senses and a revised scheme of VerbNet roles. We provide a detailed evaluation of the interaction between sense and role annotation. The resulti...
متن کامل